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1. Introduction 

Simulations of QCD on the lattice seem to have reached the stage at which all systematic errors 
can be controlled in a well defined way. This includes control over the continuum limit a — > for 
which calculations at a series of decreasing lattice spacings a have to be performed. Approaching 
the critical point at a = using contemporary algorithms, however, one encounters severe critical 
slowing down. As we will show in the following, it is most prominent in the topological charge, 
whose auto-correlation time increases dramatically as the lattice spacing drops below a a 0.08fm. 
Critical slowing down is a very general problem in Markov chain Monte Carlo simulations and also 
the slowing down of topological modes has been observed before. Recently, Del Debbio, Manca 
and Vicari[l] studied the issue in the CP^ -1 ) model and found evidence for the auto-correlation 
times rising exponentially with the correlation length. Also in Yang-Mills theory [2] and QCD [3] 
large auto-correlations associated with topology have been observed. 

Our interest in the subject was spurred by the observation of a significant slowing down of 
the topological charge in our simulations with two flavors of dynamical Wilson fermions in the 
context of the CLS effort 1 . We are particularly interested in lattice spacings which are fine enough 
to reliably simulate a relativistic charm quark for which we estimate a a 0.04fm to be necessary [4]. 

2. Dynamical simulations 

To illustrate the problem, let us start with the dynamical simulations of Nf = 2 degenerate 
flavors of non-perturbatively improved Wilson quarks and Wilson gauge action. We use three 
values of the coupling constant, j3 = 5.3, 5.5 and 5.7, which correspond to roughly a as 0.08fm, 
0.06fm and 0.04fm respectively [5, 6, 7]. For each coupling constant, several values of the sea 
quark mass are simulated with the DD-HMC algorithm [8], which uses a block decomposition of 
the lattice to separate the infrared from the short distance modes. As a consequence, only gauge 
links within the blocks are updated ("are active")- We will show in the pure gauge study that this 
has no noticeable effect on the problems discussed in this section beyond rescaling by the fraction 
of active links. 

Each of the ensembles was generated with trajectories of length % = 0.5 and after each tra- 
jectory, the lattice is shifted randomly to change the active links. On the saved configurations we 
measure the topological charge using three times HYP smeared gauge fields [9] from which we 
constructed the clover field strength tensor F^ v . Then the charge Q is given by 

Although this definition of the charge does in general not give an integer and is therefore not 
particularly useful for the computation of physical observables, it is known to be well correlated 
with, e.g. the definition of the charge via the number of zero modes of a chiral Dirac operator. In 
the end, the precise interpretation does not matter, since a slow evolution shows a problem of the 
algorithm. Also note that since Q in Eq. 2.1 is parity odd, its expectation value vanishes. 

'https://twiki.cern.ch/twiki/bin/view/CLS 
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Figure 1: Examples of the Monte Carlo time history of the topological charge. The data shown in the left 
plot is from a run on a 64 x 32 3 lattice at j3 =5.3, m K 360MeV. The central plot shows a 64 x 32 3 lattice 
at j8 = 5.5, m n w 460MeV run, the right plot a 128 x 64 3 lattice at j8 = 5.7, m % w 480MeV. 

Examples of the Monte Carlo time history for the three values of the coupling constant are 
given in Fig. 1. At the coarsest lattice spacing a th 0.08fm, the run with roughly 5000 trajecto- 
ries shows quite long correlations, but a quite decent sampling of all topological sectors. At the 
intermediate lattice spacing a m 0.06fm, the auto-correlations get worse and for the finest lattice 
spacing, even 1000 trajectories of length 0.5 do not manage to move the charge considerably. The 
qualitative picture does not depend on the value of the sea quark mass. What matters most is the 
value of the gauge coupling constant. However, due to the high cost of the dynamical simulations 
our time histories are too short to make a definitive statement about the auto-correlation time of the 
topological charge. 

3. Pure gauge study 

Since the sea quark mass seems to have little effect, we now study the problem in pure gauge 
theory. We measure the auto-correlations of the topological charge and of Wilson loops in simu- 
lations with the same DD-HMC algorithm as in the dynamical runs. As a comparison serve some 
runs with the standard HMC algorithm, to check for effects of the block-decomposition, and with 
hybrid over-relaxation (HOR) in order to see whether this is a particular problem of the molecular 
dynamics based algorithms. 

We use Wilson gauge action and start on a 24 4 lattice at jS = 6.0 and match lattices at constant 
physical volume using ro as a scale. With ro = 0.49fm, the lattices of size L/a = 16, 24, 32 and 48 
have therefore a lattice spacing of about 0. 14fm, 0.09fm, 0.07fm and 0.046fm, respectively [10]. 

With the DD-HMC algorithm and a fixed trajectory length of z = 4, we measure the auto- 
correlation time of Q 2 as a function of the lattice spacing. In the next section we discuss the 
method used to determine Ti nt and why we use Q 2 and not Q. Here, and in all other plots, we 
multiply the auto-correlation times by the fraction of active links. In pure gauge theory, also the 
cost of a trajectory scales with this ratio. The result is shown on the left of Fig. 2. We observe a 
steep rise towards the continuum limit, which is roughly compatible with Ti nt °< a~ 5 , but also the 
exponential hypothesis of Ref. [1] can describe the data. Our precision and our range in a are not 
enough to distinguish between the two scenarios. 

The problem is not limited to molecular dynamics based algorithms only. On the right of 
Fig. 2 we show data by M. Liischer and F. Palombi using sweeps combined of heatbath and over- 
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Figure 2: Left: Auto-correlation time of Q 2 and the (0.5fm) x (0.5fm) square Wilson loop as a function 

of the lattice spacing using the DD-HMC algorithm. The two curves are possible functional forms which 

describe the data. At a ss 0.09fm we show values of T; nt (2 2 ) for several domain decompositions. Right: 

Same plot, but using the HOR algorithm on lattices of size (4.8fm) x (2.4fm) 3 . The scales of the y-axis of 

the two plots are unrelated, the line in the right plot is just to guide the eye. 




X t/c 

Figure 3: Left: Dependence of the AC time of Q 2 on the trajectory length for the HMC and DD-HMC 
algorithm on the 24 4 lattices. The units of T{ nt are such that the differences in cost due to the inactive links 
has been scaled out. The values for several block decompositions at the same T slightly shifted for better 
visibility, 6 4 , 6 2 x 12 2 and 12 4 from left to right. Right: Estimator <pg ff j = r o (t)/r o (0)[r n (t/2)/r n (t)} 2 
for (p ,i of the (0.5fm) 2 Wilson loop. 



relaxation steps. The rise of the auto-correlation time is about as steep as for the DD-HMC algo- 
rithm. This indicates a universal problem, because the two algorithms are rather different. 

Since the auto-correlation times are much larger than 0.5, the minimum value, a longer trajec- 
tory length might have a beneficial effect on the performance of the algorithm. The random walk is 
then composed of longer, more directed pieces of evolution. An example for this has already been 
demonstrated in the Schrodinger functional [12]. In Fig. 3 on the left we show the result for (5=6. 
We observe an improvement — in total cost of the simulation — which is roughly compatible with 
the expected 1 / ypt behavior. This gain, however, is far too small to solve the problem. In the same 
plot we put numbers from the DD-HMC algorithm with several block decompositions (6 4 , 6 2 x 12 2 
and 12 4 ) and the HMC algorithm without domain decomposition. All these values are compatible 
with another. The simple rescaling by the fraction of active links accounts for all the differences in 
the auto-correlation times within errors. 



4 



Investigating the critical slowing down of QCD simulations 



Stefan Schaefer 



The question arises in how far other observables are affected. This can be only answered on 
a case by case basis. Here we study rectangular Wilson loops, both, constructed from thin links 
and from once HYP smeared links, in particular the one with the longest auto-correlation time, 
the smeared (0.5fm)x(0.5fm) square loop. Fig. 3 shows the result. On coarse lattices Ti nt of the 
loop and the charge are comparable, however, the loops are much less affected by critical slowing 
down as one approaches the continuum. This can be interpreted as a dynamical decoupling of the 
topological modes from the Wilson loops as the continuum limit is approached. 

4. Stabilized error analysis and dynamical (de)coupling 

The previous sections present the description of a grave problem for lattice simulations. The 
topological charge moves very slowly in the simulations. We have already seen that other observ- 
ables may be affected less, because they are only weakly coupled to this "mode". Here we want 
to discuss what such a decoupling means and how error estimates can be stabilized in this difficult 
situation. 

We consider an observable O = F(A), which is a function of the mean A a = (A a ) of primary 
observables A a = A* a averaged over N subsequent measurements in the Markov chain. The error, 
80 is given by (80) 2 = var(O) 2t "^°) t where the variance is a property of the theory and the 
auto-correlation time a property of the algorithm. It is given [11] by the summed auto-correlation 
function T(t) 




r (t)=Y,F a F fi r ap (t), r aP (t) = ((A a (t)-A a )(Ap(o)-A p )) , 

a/3 

where t is the separation in Monte Carlo time and F a = dF/dA a . 

The function T can be expressed in terms of the transition matrix of the Markov chain, M(q',q), 
which gives the transition probability for a change from a state q to a state q'. Denoting the equi- 
librium probability distribution by W(q), and assuming the algorithm to satisfy detailed balance 
means that the matrix T(q',q) = (W(q'))~ l l 2 M(q' ,q)(W(q)) 1 / 2 is symmetric. Its eigenfunctions 
are labeled % n {q) and its eigenvalues X„. The autocorrelation function can then be written as 

Ta/3 (0 = £ W<a*7n,/3 = £[sign(A„)]' exp(-t/r n ) f]* i a f] n ,p , (4-2) 

n n 

where X n < 1 and z n+ \ < z n = — l/ln|A„| and 

rin,a = (Xn,W l / 2 A a ) = f dqX*(q) W l l 2 {q)A a {q) . (4.3) 

In general all modes n contribute to the sum in Eq. (4.2) and slow modes have x n S> 1 producing a 
long tail in T(t). This is typically hard to estimate: one quickly reaches the point from which the 
uncertainty of T(t) is larger than the signal. Therefore one truncates the sum in Eq. 4.1 at some 
time w. This, however, introduces a bias which can be sizeable if the tail is very long. 

There can be selection rules imposed by common symmetries of the theory (i.e. W(q)) and 
the algorithm (i.e. T(q',q)). Parity is such a case relevant for us: Q couples only to parity odd 
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Figure 4: Left: Extracting the exponential auto-correlation time from an effective mass plot of T(t) of Q 2 
for our 32 4 , x = 1 run with the DD-HMC algorithm. Right: Estimate of Ti nt of the 0.5fm square Wilson loop 
as a function of the summation window according to Eq. 4.4. 



modes, while Q 2 and other parity even observables couple only to even modes. Since in our lattice 
QCD discretization parity is an exact symmetry, parity odd observables vanish and we only need to 
consider even ones. Hence only those n, corresponding to parity even % n {q) contribute in Eq. (4.2). 

In order to learn about the slowest mode in a simulation, we can then investigate an "effective 
mass" plot. 2 Indeed we find a long plateau in l/r e ff = log(rn(f) /T\\{t — 1)) for A\ = Q 2 in Fig. 4. 
It provides a good estimate for 1 / T exp = — log( X\ ) . 

If X\ has been determined well, it can be inserted into the (for even w) exact inequality 

Ti„ t < 2 + 2. r o ( ) \-X\ r o (0) ~ 2 + £ ro(0) + T »p r o ( ) {4A) 

to obtain an upper bound for the error. 3 

In a practical procedure we first select a set of observables (plaquette and Q 2 after various 
smearing levels), determine approximate plateaux in their "effective mass" 1/T e ff and take the 
largest one as the estimate for T exp in Eq. (4.4). Note that if the tail of the auto correlation function 
is dominated by just the slowest mode or by modes with eigenvalues close to it, the upper bound 
will (almost) be saturated. That such an upper bound is not overly conservative is demonstrated in 
Fig. 4 on the right, on the 0.07fm lattice where the T e ff is taken from the topological charge and 
the observable is the 7x7 Wilson loop. We find that generically the upper bound converges faster 
as a function of the window size w than the more standard estimate Ti nt = 1/2 + Y,p=i r°(o) ' ^ e 
therefore advocate its use in difficult situations, e.g. in HMC simulations of QCD. 

Let us now return to the decoupling of some observables from the critical slowing down. Since 
in the representation Eq.(4.2) all modes n contribute, a decoupling from the slowest mode means 
I Lce^a?7i,a| 2 /ro(0) = <Po,i 1. Combining the autocorrelation functions of the Wilson loops 
and those of Q 2 , one can easily form estimators which converge to <po,i for large t. Even if we 
observed this convergence to be rather slow, we found that the estimators for q>o,\ become tiny at 
small lattice spacing, see Fig. 3 (right). This holds for Wilson loops and Creutz ratios. 

2 Experience shows that oscillating terms (sign(/l n ) = — 1) are usually irrelevant at large t. 

3 Actually, the DD-HMC algorithm does not fulfill detailed balance, because of the directed block shifts, and the 
formulae do not apply. However, we expect this to have little effect on the large time behavior of T(t). 
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5. Summary 

In this contribution we have studied the critical slowing down of lattice simulations using the 
HMC and DD-HMC algorithms. We demonstrated a steep increase of the auto-correlation time 
of the squared topological charge in pure gauge simulations which match our observations with 
dynamical fermions. Tuning of the parameters of the algorithm can improve the situation a bit, but 
probably more dramatic changes like the one proposed in [13] will be needed to solve the problem. 

In the second part, we studied the impact of such slowly moving modes on the estimation of the 
errors in Monte Carlo simulations. We proposed a method to give upper bounds on the contribution 
from the tail of the auto-correlation function, which is normally neglected. The resulting estimates 
are neither overly conservative nor impractical and therefore are a safer way to determine the errors. 
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